Continuing Education
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.05)
- Europe > Finland (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (7 more...)
- Research Report (0.68)
- Instructional Material > Online (0.44)
- North America > United States > Iowa > Johnson County > Iowa City (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
- Information Technology (0.68)
- Health & Medicine > Consumer Health (0.55)
- Education > Educational Setting > Continuing Education (0.47)
Meta-Reinforcement Learning with Self-Modifying Networks
Deep Reinforcement Learning has demonstrated the potential of neural networks tuned with gradient descent for solving complex tasks in well-delimited environments. However, these neural systems are slow learners producing specialized agents with no mechanism to continue learning beyond their training curriculum. On the contrary, biological synaptic plasticity is persistent and manifold, and has been hypothesized to play a key role in executive functions such as working memory and cognitive flexibility, potentially supporting more efficient and generic learning abilities. Inspired by this, we propose to build networks with dynamic weights, able to continually perform self-reflexive modification as a function of their current synaptic state and action-reward feedback, rather than a fixed network configuration. The resulting model, MetODS (for Meta-Optimized Dynamical Synapses) is a broadly applicable meta-reinforcement learning system able to learn efficient and powerful control rules in the agent policy space. A single layer with dynamic synapses can perform one-shot learning, generalize navigation principles to unseen environments and demonstrates a strong ability to learn adaptive motor policies, comparing favorably with previous meta-reinforcement learning approaches.
Addressing the Plasticity-Stability Dilemma in Reinforcement Learning
Maheshwari, Mansi, Raisbeck, John C., da Silva, Bruno Castro
Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning ability is known as plasticity loss. To restore plasticity, prior work has explored periodically resetting the parameters of the learning network, a strategy that often improves overall performance. However, such resets come at the cost of a temporary drop in performance, which can be dangerous in real-world settings. To overcome this instability, we introduce AltNet, a reset-based approach that restores plasticity without performance degradation by leveraging twin networks. The use of twin networks anchors performance during resets through a mechanism that allows networks to periodically alternate roles: one network learns as it acts in the environment, while the other learns off-policy from the active network's interactions and a replay buffer. At fixed intervals, the active network is reset and the passive network, having learned from prior experiences, becomes the new active network. AltNet restores plasticity, improving sample efficiency and achieving higher performance, while avoiding performance drops that pose risks in safety-critical settings. We demonstrate these advantages in several high-dimensional control tasks from the DeepMind Control Suite, where AltNet outperforms various relevant baseline methods, as well as state-of-the-art reset-based techniques.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Portugal > Braga > Braga (0.04)
Resolving Conflicts in Lifelong Learning via Aligning Updates in Subspaces
Zhou, Yueer, Wu, Yichen, Wei, Ying
Low-Rank Adaptation (LoRA) enables efficient Continual Learning but often suffers from catastrophic forgetting due to destructive interference between tasks. Our analysis reveals that this degradation is primarily driven by antagonistic directional updates where new task gradients directly oppose the historical weight trajectory. To address this, we propose PS-LoRA (Parameter Stability LoRA), a framework designed to resolve conflicts by aligning updates within the optimization subspace. Our approach employs a dual-regularization objective that penalizes conflicting directions and constrains magnitude deviations to ensure consistency with prior knowledge. Additionally, we implement a magnitude-based merging strategy to consolidate sequential adapters into a robust representation without retraining. Experiments on NLP and Vision benchmarks show that PS-LoRA outperforms state-of-the-art methods by preserving the stability of learned representations while efficiently adapting to new domains.
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
Learning Without Time-Based Embodiment Resets in Soft-Actor Critic
Farrahi, Homayoon, Mahmood, A. Rupam
When creating new reinforcement learning tasks, practitioners often accelerate the learning process by incorporating into the task several accessory components, such as breaking the environment interaction into independent episodes and frequently resetting the environment. Although they can enable the learning of complex intelligent behaviors, such task accessories can result in unnatural task setups and hinder long-term performance in the real world. In this work, we explore the challenges of learning without episode terminations and robot embodiment resets using the Soft Actor-Critic (SAC) algorithm. To learn without terminations, we present a continuing version of the SAC algorithm and show that, with simple modifications to the reward functions of existing tasks, continuing SAC can perform as well as or better than episodic SAC while reducing the sensitivity of performance to the value of the discount rate $γ$. On a modified Gym Reacher task, we investigate possible explanations for the failure of continuing SAC when learning without embodiment resets. Our results suggest that embodiment resets help with exploration of the state space in the SAC algorithm, and removing embodiment resets can lead to poor exploration of the state space and failure of or significantly slower learning. Finally, on additional simulated tasks and a real-robot vision task, we show that increasing the entropy of the policy when performance trends worse or remains static is an effective intervention for recovering the performance lost due to not using embodiment resets.
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning
Gao, Minghe, Li, Juncheng, Lin, Yuze, Liu, Xuqi, Ji, Jiaming, Pan, Xiaoran, Xu, Zihan, Li, Xian, Li, Mingjie, Ji, Wei, Wei, Rong, Tang, Rui, Wang, Qizhou, Shen, Kai, Xiao, Jun, Wu, Qi, Tang, Siliang, Zhuang, Yueting
W e contend that embodied learning is fundamentally a life-cycle problem rather than a single-stage optimization. Systems that optimize only one link (data collection, simulation, learning, or deployment) rarely sustain improvement or generalize beyond narrow settings. W e introduce Arcadia, a closed-loop framework that operational-izes embodied lifelong learning by tightly coupling four stages: (1) Self-evolving exploration and grounding for autonomous data acquisition in physical environments, (2) Generative scene reconstruction and augmentation for realistic and extensible scene creation, (3) a Shared embodied representation architecture that unifies navigation and manipulation within a single multimodal backbone, and (4) Sim-from-real evaluation and evolution that closes the feedback loop through simulation-based adaptation. This coupling is non-decomposable: removing any stage breaks the improvement loop and reverts to one-shot training. Arcadia delivers consistent gains on navigation and manipulation benchmarks and transfers robustly to physical robots, indicating that a tightly coupled lifecycle: continuous real-world data acquisition, generative simulation update, and shared-representation learning, supports lifelong improvement and end-to-end generalization. W e release standardized interfaces enabling reproducible evaluation and cross-model comparison in reusable environments, positioning Arcadia as a scalable foundation for general-purpose embodied agents.
- Research Report (0.82)
- Instructional Material (0.61)
An energy-efficient spiking neural network with continuous learning for self-adaptive brain-machine interface
The number of simultaneously recorded neurons follows an exponentially increasing trend in implantable brain-machine interfaces (iBMIs). Integrating the neural decoder in the implant is an effective data compression method for future wireless iBMIs. However, the non-stationarity of the system makes the performance of the decoder unreliable. To avoid frequent retraining of the decoder and to ensure the safety and comfort of the iBMI user, continuous learning is essential for real-life applications. Since Deep Spiking Neural Networks (DSNNs) are being recognized as a promising approach for developing resource-efficient neural decoder, we propose continuous learning approaches with Reinforcement Learning (RL) algorithms adapted for DSNNs. Banditron and AGREL are chosen as the two candidate RL algorithms since they can be trained with limited computational resources, effectively addressing the non-stationary problem and fitting the energy constraints of implantable devices. To assess the effectiveness of the proposed methods, we conducted both open-loop and closed-loop experiments. The accuracy of open-loop experiments conducted with DSNN Banditron and DSNN AGREL remains stable over extended periods. Meanwhile, the time-to-target in the closed-loop experiment with perturbations, DSNN Banditron performed comparably to that of DSNN AGREL while achieving reductions of 98% in memory access usage and 99% in the requirements for multiply- and-accumulate (MAC) operations during training. Compared to previous continuous learning SNN decoders, DSNN Banditron requires 98% less computes making it a prime candidate for future wireless iBMI systems.
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Energy (1.00)
- Education > Educational Setting > Continuing Education (1.00)
MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning
Wang, Junjian, Zhao, Lidan, Zhang, Xi Sheryl
Ensuring the safety of embodied AI agents during task planning is critical for real-world deployment, especially in household environments where dangerous instructions pose significant risks. Existing methods often suffer from either high computational costs due to preference alignment training or over-rejection when using single-agent safety prompts. To address these limitations, we propose MADRA, a training-free Multi-Agent Debate Risk Assessment framework that leverages collective reasoning to enhance safety awareness without sacrificing task performance. MADRA employs multiple LLM-based agents to debate the safety of a given instruction, guided by a critical evaluator that scores responses based on logical soundness, risk identification, evidence quality, and clarity. Through iterative deliberation and consensus voting, MADRA significantly reduces false rejections while maintaining high sensitivity to dangerous tasks. Additionally, we introduce a hierarchical cognitive collaborative planning framework that integrates safety, memory, planning, and self-evolution mechanisms to improve task success rates through continuous learning. We also contribute SafeAware-VH, a benchmark dataset for safety-aware task planning in VirtualHome, containing 800 annotated instructions. Extensive experiments on AI2-THOR and VirtualHome demonstrate that our approach achieves over 90% rejection of unsafe tasks while ensuring that safe-task rejection is low, outperforming existing methods in both safety and execution efficiency. Our work provides a scalable, model-agnostic solution for building trustworthy embodied agents.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)